420 research outputs found
Depth map compression via 3D region-based representation
In 3D video, view synthesis is used to create new virtual views between
encoded camera views. Errors in the coding of the depth maps introduce
geometry inconsistencies in synthesized views. In this paper, a new 3D plane
representation of the scene is presented which improves the performance of
current standard video codecs in the view synthesis domain. Two image segmentation
algorithms are proposed for generating a color and depth segmentation.
Using both partitions, depth maps are segmented into regions without
sharp discontinuities without having to explicitly signal all depth edges. The
resulting regions are represented using a planar model in the 3D world scene.
This 3D representation allows an efficient encoding while preserving the 3D
characteristics of the scene. The 3D planes open up the possibility to code
multiview images with a unique representation.Postprint (author's final draft
2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph Convolution for RGB-D Indoor Scene Classification
Multi-modal fusion has been proved to help enhance the performance of scene
classification tasks. This paper presents a 2D-3D Fusion stage that combines 3D
Geometric Features with 2D Texture Features obtained by 2D Convolutional Neural
Networks. To get a robust 3D Geometric embedding, a network that uses two novel
layers is proposed. The first layer, Multi-Neighbourhood Graph Convolution,
aims to learn a more robust geometric descriptor of the scene combining two
different neighbourhoods: one in the Euclidean space and the other in the
Feature space. The second proposed layer, Nearest Voxel Pooling, improves the
performance of the well-known Voxel Pooling. Experimental results, using
NYU-Depth-V2 and SUN RGB-D datasets, show that the proposed method outperforms
the current state-of-the-art in RGB-D indoor scene classification task
SkinningNet: two-stream graph convolutional neural network for skinning prediction of synthetic characters
This work presents SkinningNet, an end-to-end Two-Stream Graph Neural Network architecture that computes skinning weights from an input mesh and its associated skeleton, without making any assumptions on shape class and structure of the provided mesh. Whereas previous methods pre-compute handcrafted features that relate the mesh and the skeleton or assume a fixed topology of the skeleton, the proposed method extracts this information in an end-to-end learnable fashion by jointly learning the best relationship between mesh vertices and skeleton joints. The proposed method exploits the benefits of the novel Multi-Aggregator Graph Convolution that combines the results of different aggregators during the summarizing step of the Message-Passing scheme, helping the operation to generalize for unseen topologies. Experimental results demonstrate the effectiveness of the contributions of our novel architecture, with SkinningNet outperforming current state-of-the-art alternatives.This work has been partially supported by the project PID2020-117142GB-I00, funded by MCIN/AEI /10.13039/501100011033.Peer ReviewedPostprint (author's final draft
Learning task-specific features for 3D pointcloud graph creation
Processing 3D pointclouds with Deep Learning methods is not an easy task. A
common choice is to do so with Graph Neural Networks, but this framework
involves the creation of edges between points, which are explicitly not related
between them. Historically, naive and handcrafted methods like k Nearest
Neighbors (k-NN) or query ball point over xyz features have been proposed,
focusing more attention on improving the network than improving the graph. In
this work, we propose a more principled way of creating a graph from a 3D
pointcloud. Our method is based on performing k-NN over a transformation of the
input 3D pointcloud. This transformation is done by an Multi-Later Perceptron
(MLP) with learnable parameters that is optimized through backpropagation
jointly with the rest of the network. We also introduce a regularization method
based on stress minimization, which allows to control how distant is the learnt
graph from our baseline: k-NN over xyz space. This framework is tested on
ModelNet40, where graphs generated by our network outperformed the baseline by
0.3 points in overall accuracy
Comparison of MPEG-7 descriptors for long term selection of reference frames
During the last years, the amount of multimedia content has greatly
increased. This has multiplied the need of efficient compression of
the content but also the ability to search, retrieve, browse, or filter
it. Generally, video compression and indexing have been investigated
separately. However, as the amount of multimedia content
grows, it will be very interesting to study representations that, at the
same time, provide good compression and indexing functionalities.
Moreover, even if the indexing metadata is created for functionalities
such as search, retrieval, browsing, etc., it can also be employed
to increase the efficiency of current video codecs. Here, we use it
to improve the long term prediction step of the H.264/AVC video
codec. This paper focuses on the comparison between four different
MPEG-7 descriptors when used in the proposed scheme.Peer ReviewedPostprint (published version
Gesture controlled interactive rendering in a panoramic scene
The demonstration described hereafter covers technical work
carried out in the FascinatE project [1], related to the interactive
retrieval and rendering of high-resolution panoramic scenes. The
scenes have been captured by a special panoramic camera (the
OMNICAM) [2] with is capturing high resolution video featuring
a wide angle (180 degrees) field of view. Users can access the
content by interacting based on a novel device-less and markerless
gesture-based system that allows them to interact as naturally
as possible, permitting the user to control the rendering of the
scene by zooming, panning or framing through the panoramic
scenePeer ReviewedPostprint (published version
Análisis de la dependencia económica de la República de Ecuador frente a la República Popular de China. período 2008 – 2014
Ecuador posee un modelo de producción primario exportador, basado en la explotación y comercialización de petróleo y productos tradicionales, mismo que condiciona el desarrollo económico del país. Tras el gobierno del Econ. Rafael Correa, se presenta un modelo político-económico que tiene como fin lograr la industrialización mediante el cambio de la matriz productiva. Sin embargo, el objetivo no es alcanzado. Una de las causas es el fortalecimiento de las relaciones bilaterales con China quien confirma su posición dentro de los principales socios comerciales para Ecuador. Asimismo, aumenta el flujo de inversión extranjera directa, misma que es enfocada en la explotación de minas y canteras. Además, se constituye como fuente principal de financiamiento para la construcción de proyectos emblemáticos ecuatorianos. En este punto, China representa un socio estratégico para el país. Sin embargo, mientras el país asiático mantiene un constante crecimiento, Ecuador no consolida su proceso de industrialización. La evolución de las relaciones bilaterales, en los tres ejes mencionados, crea una dependencia económica frente a China y por ende confirma la posición de Ecuador, como país subdesarrollado. Esta realidad es estudiada mediante la Teoría de la Dependencia, al plantear directrices claras para probar la problemática establecida
Spatio-temporal road detection from aerial imagery using CNNs
The main goal of this paper is to detect roads from aerial imagery recorded by drones. To achieve this, we
propose a modification of SegNet, a deep fully convolutional neural network for image segmentation. In
order to train this neural network, we have put together a database containing videos of roads from the point
of view of a small commercial drone. Additionally, we have developed an image annotation tool based on
the watershed technique, in order to perform a semi-automatic labeling of the videos in this database. The
experimental results using our modified version of SegNet show a big improvement on the performance of the
neural network when using aerial imagery, obtaining over 90% accuracy.Postprint (published version
Segmentation-based multi-scale edge extraction to measure the persistence of features in unorganized point clouds
Edge extraction has attracted a lot of attention in computer vision. The accuracy of extracting edges in point clouds can be a significant asset for a variety of engineering scenarios. To address these issues, we propose a segmentation-based multi-scale edge extraction technique. In this approach, different regions of a point cloud are segmented by a global analysis according to the geodesic distance. Afterwards, a multi-scale operator is defined according to local neighborhoods. Thereupon, by applying this operator at multiple scales of the point
cloud, the persistence of features is determined. We illustrate the proposed method by computing a feature weight that measures the likelihood of a point to be an edge, then detects the edge points based on that value at both global and local scales. Moreover, we evaluate quantitatively and qualitatively our method. Experimental results show that the proposed approach achieves a superior accuracy. Furthermore, we demonstrate the robustness of our approach in noisier real-world datasets.Peer ReviewedPostprint (author's final draft
- …